Our Honorary Doctorate, Harald Baayen, Professor of Quantitative Linguistics at the University of Tübingen, will return to Tartu at the beginning of December. During his visit, he will lead a workshop on using word embeddings in linguistic analysis.
Word embeddings are high-dimensional numeric representations of word meaning,
derived from large corpora. They are widely used in AI and NLP. In this
workshop, I will first introduce some methods for calculating embeddings, and
discuss both their weaknesses and strengths. I will then provide a series of
worked examples using jupyter notebooks running R, that will enable
participants to replicate several published studies, and to apply the methods
to their own data.
One set of worked examples will illustrate the finding that the change from
singular to plural in semantic space can depend on semantic class (English) or
case (Russian, Finnish). A second set of worked examples will illustrate how
embeddings can be used to study semantic transparence and productivity, using
data from Mandarin Chinese; the CAOSS and FRACSS models of Marelli and
colleagues will be discussed in detail, using data drawn from English. A third
worked example will show how similarities and differences in the cognitive
organization of lexical semantic space of different languages (Mandarin Chinese
and English) can be brought to light, using procrustes rotation to place
language-specific embeddings in a joint semantic space. A final worked
example will show how embeddings can be used to predict the fine phonetic
detail of the pitch contours of English left-stressed two-syllable words.
The workshop takes place on December 2 from 2-4 pm at Jakobi 2-438.
Please register here.